1 Introduction

Advances in flow and mass cytometry now enable the simultaneous measurement of more than 50 parameters per cell in millions of cells per sample. This high dimensional data poses new demands to existing analysis techniques. Conventional gating, until now the gold standard for cytometry data analysis, faces challenges in scaling for the application of high-dimensional cytometry data. Issues with conventional gating strategies include inefficiency, subjectivity, and a significant risk of missing unknown or minor cell populations. Hence, computational methods for unbiased analysis of high-dimensional cytometry data analysis are required.
Several computational tools for cytometry data analysis are available, primarily designed for the R programming language. Examples include CATALYST / CyTOF workflow, FlowSOM, flowCore, flowAI, and diffcyt. These workflows address specific aspects of high-dimensional cytometry data analysis, including handling flow cytometry standard (FCS) files, preprocessing, dimensionality reduction, clustering based on self-organizing maps, and differential abundance analysis. However, interoperability between these packages is often limited and inefficient, and methods for advanced data analysis such as data integration and trajectory analysis are scarce.
Whereas high-dimensional cytometry data analysis methods are less established, algorithms for analyzing single-cell RNA sequencing (scRNAseq) data are highly developed and offer a host of analysis possibilities. The most commonly used framework for scRNAseq data analysis is Seurat, which covers every aspect of data analysis from preprocessing to a range of downstream analyses, including dimensionality reduction and clustering. A notable advantage of scRNAseq packages is their interoperability, since most third-party tools that offer additional analysis options, such as batch correction or trajectory analysis, integrate seamlessly into the Seurat workflow. However, these scRNAseq analysis algorithms are not readily accessible for the analysis of cytometry data.
With these challenges and possibilities in mind, we developed the R package Seumetry, a flow and mass cytometry analysis framework that offers state-of-the-art analysis options across the whole data life cycle. Seumetry includes a broad range of cytometry-specific algorithms for data handling, while it seamlessly integrates with Seurat, providing access to the latest scRNAseq analysis options.

2 Description of data

To demonstrate the functionality and workflow of Seumetry, we generated a high-dimensional spectral flow cytometry dataset comprising 39 parameters of human intestinal immune cells. The test dataset consists of immune cells isolated from the intestinal mucosa of 7 adult human donors, including two anatomical layers: epithelium and lamina propria. The data was manually pre-gated on single live cells.

3 Setup

Load libraries

# BiocManager::install('EnhancedVolcano')
require(pheatmap)
library(EnhancedVolcano)
library(Seumetry)
library(ggplot2)
library(dplyr)

Download test dataset

# zenodo url
zenodo <- "https://zenodo.org/records/11935872/"
# create data directory
dir.create("data/fcs", recursive = TRUE, showWarnings = FALSE)
# download panel and metadata
download.file(paste0(zenodo, "files/metadata.csv?download=1"), "data/metadata.csv",
    quiet = TRUE)
download.file(paste0(zenodo, "files/panel.csv?download=1"), "data/panel.csv", quiet = TRUE)
# download fcs files
for (file in read.csv("data/metadata.csv")$file_name) download.file(paste0(zenodo,
    "files/", file, "?download=1"), paste0("data/fcs/", file), quiet = TRUE)

Set global ggplot2 options and define theme to make beautiful plots.

# adjust fill and color globally using options
discrete_colors <- c("#5988B2", "#C9635E", "#67976B", "#C88F67", "#A583B0", "#CDB86A",
    "#ABCCE2", "#7F007F", "#C8A5A3", "#B9E1B8", "#C5807B", "#E0CDB1", "#917C6F",
    "#5988B2", "#D5E8AE", "#CC7E78", "#B4C9E3", "#AB8BAF", "#97C496", "#5F6E9D",
    "#B3CC96", "#AF8B99", "#C88F67", "#C16D6B", "#8E8E8E", "#5E93C2", "#FF8FBF",
    "#AFD192", "#8E8E8E", "#8BCF92", "#5F6E9D", "#8E6A8E", "#A88169", "#9D7BAE",
    "#6D8D7E", "#6E6E6E", "#9D4949", "#F2E89C", "#A9FF93", "#93A9C5", "#B76CA8",
    "#D4857B", "#4C0000", "#BDBA88", "#6E6E6E", "#FFDF7F", "#B2B2B2", "#366E5E",
    "#C6C6C6", "#DDBE8F", "#C6C6C6", "#89603A", "#9C7A8D", "#7CCCBE", "#AFFF9F")
options(ggplot2.discrete.colour = discrete_colors)
options(ggplot2.discrete.fill = discrete_colors)
# define new gradient colors
gradient_colors <- colorRampPalette(c("steelblue", "slategray2", "white", "tan1",
    "firebrick3"))(100)
# define a theme to match plots from other packages with Seumetry plots
set_theme <- list(theme_linedraw(), theme(aspect.ratio = 1, panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()))

4 Metadata and panel

Metadata and panel are data.frames required by Seumetry for loading FCS files, creating a Seurat object, and preprocess the data.

metadata <- read.csv("data/metadata.csv")
panel <- read.csv("data/panel.csv")

Metadata: a data.frame with at least 2 columns: “file_name” and “sample_id”. All additional columns are used as metadata columns and are added to the Seurat object.

head(metadata)
##                 file_name   sample_id patient_id tissue layer
## 1 1_Ileum_IEL_CD45pos.fcs 1_Ileum_IEL          1  ileum   IEL
## 2 1_Ileum_LPL_CD45pos.fcs 1_Ileum_LPL          1  ileum   LPL
## 3 2_Ileum_IEL_CD45pos.fcs 2_Ileum_IEL          2  ileum   IEL
## 4 2_Ileum_LPL_CD45pos.fcs 2_Ileum_LPL          2  ileum   LPL
## 5 3_Ileum_IEL_CD45pos.fcs 3_Ileum_IEL          3  ileum   IEL
## 6 3_Ileum_LPL_CD45pos.fcs 3_Ileum_LPL          3  ileum   LPL

Panel: a data.frame with at least 2 columns: “fcs_colname” and “antigen”. The fcs_colname are the names of the channels in the FCS file. Antigen is the desired name for downstream analysis. Additionally, columns for transformation are required. See below (transformation) for details.

head(panel)
##         fcs_colname antigen arcsinh_cofactor biexp_neg biexp_pos biexp_width
## 1             APC.A  IL15RA             6000         0       4.0       -1000
## 2         APC.Cy7.A    CD27             6000         0       4.5       -1000
## 3    APC.Fire.810.A    CCR7             6000         0       4.0       -1000
## 4        APC.R700.A   CD127             6000         0       4.5       -1000
## 5 Alexa.Fluor.647.A    CD1C             6000         1       4.5       -1000
## 6           BB515.A   CD141             6000         1       6.0       -1000

5 Loading FCS files

FCS files are loaded based on *.fcs files in the folder and on “file_name” column in metadata data.frame.
Each cell will receive a unique cell ID based on sample_id provided in metadata.

fcs_fs <- create_flowset("data/fcs", metadata)
fcs_fs
## A flowSet with 14 experiments.
## 
## column names(50): FSC.A FSC.H ... eFluor.506.A Time

It is highly recommended to save the “fcs_fs” object.

6 Create Seurat object

All FCS files will be merged and metadata will be added based on sample_id and metadata data.frame. Raw data is saved in Seurat assay “fcs”. Only Channels will be kept that are present in panel data.frame and channels will be renamed from “fcs_colname” to “antigen”. Channels that were not indicated in panel data.frame are stored in Seurat assay “unused”. All panel data are stored in slot for easy access.

seu <- create_seurat(fcs_fs, panel, metadata, derandomize = FALSE)
seu
## An object of class Seurat 
## 50 features across 1151826 samples within 2 assays 
## Active assay: fcs (39 features, 0 variable features)
##  1 layer present: counts
##  1 other assay present: unused

Derandomization: For mass cytometry data, it might be desired to derandomize data, which will round intensity values up to then nearest whole number (see https://biosurf.org/cytof_data_scientist.html#34_Data_transformations). This is implemented in the create_seurat function by setting derandomize to TRUE.

# plot cellnumbers per sample
plot_cellnumber(seu)

7 Preprocessing

7.1 Bead normalisation

For mass cytometry data, bead normalization may be required. This vignette uses a flow cytometry test dataset, so bead normalization is not applied. Please refer to section “Other features” for a description of bead normalization implementation.

7.2 Compensation

Compensation is based on spillover matrices that can be provided within the FCS file or as a data.frame.

There are different options how to supply the matrix. For details see compensate_data function documentation. Here, we use option 3.
- Option 1) Use external spillover matrix directly: same compensation for all files.
- Option 2) Use spillover matrix saved in FCS files: same compensation for all files.
- Option 3) Use spillover matrix saved in FCS files: compensation matrix used from individual FCS files, thus can be different for each file.

The compensate_data function will use flowCore::compensate() to compensate the raw values and write a new assay into the Seurat object called “comp”. The compensated data are saved in “counts” slot.
Warning: if multiple spillover matrices are present in FCS files, use the correct one!

Here, we use option 3 to compensate the data.

# Check different matrices using:
names(flowCore::spillover(fcs_fs[[1]]))
## [1] "SPILL"      "spillover"  "$SPILLOVER"
# In this case, spillover matrix in column 3 is the correct one.
seu <- compensate_data(fcs_fs, seu, fcs_matrix = 3)
# Check that compensation worked.
plot1 <- plot_cyto(seu, x = "IgD", y = "CD3", assay = "fcs", slot = "counts", scale = "log",
    rasterize = TRUE) + ggtitle("Uncompensated")
plot2 <- plot_cyto(seu, x = "IgD", y = "CD3", assay = "comp", slot = "counts", scale = "log",
    rasterize = TRUE) + ggtitle("Compensated")
plot1 + plot2

7.3 Transformation

The following transformations are possible: arcsinh and biexp.

Transformation is performed on “counts” data of the DefaultAssay of the Seurat object. The DefaultAssay is either “fcs”, “beadnorm”, or “comp” depending on whether bead normalisaton or compensation was performed or not.

The transform_data function will return a Seurat object with transformed data written into the “data” slot.

Both arcsinh and biexp transformation can and should be used with custom parameters. These parameters should be provided in the panel data.frame, which is stored in upon Seurat object creation.

seu <- transform_data(seu, "arcsinh")
plot1 <- plot_cyto(seu, x = "CD4", y = "CD3", assay = "fcs", slot = "counts", scale = "log",
    rasterize = TRUE) + ggtitle("Untransformed")
plot2 <- plot_cyto(seu, x = "CD4", y = "CD3", assay = "comp", slot = "data", rasterize = TRUE) +
    ggtitle("Arcsinh transformation")
plot1 + plot2

7.4 Downsampling

Flow- and Mass cytometry data can contain millions of cells. If the number of cells differs largely between samples, it is advisable to downsample so overall differences are not driven by individual samples with high number of cells. Furthermore, downsampling can decrease computational time, if quick data exploration is desired.
It is highly recommended to save a non-downsampled Seurat object.

# set seed for downsampling for reproducibility
set.seed(42)
# downsample using Seurats subset function
seu <- subset(seu, downsample = 20000)
seu
## An object of class Seurat 
## 89 features across 225882 samples within 3 assays 
## Active assay: comp (39 features, 0 variable features)
##  2 layers present: counts, data
##  2 other assays present: fcs, unused
# plot cellnumbers per sample
plot_cellnumber(seu)

8 Quality control

Cytometry is an inherently noisy technology. There will be events close to the axis or artefacts due to antibody aggregation. Seumetry provides multiple quality control tools to achieve clean data for downstream analysis.

8.1 Removal of outliers

It can occur with cytometry, that some events have extremely negative or positive values. Here, we can remove these outlier events 1) by setting a manual threshold for each channel or 2) by using an automatic removal algorithm based on isolation forest.

8.1.1 Manual removal

Use a named vector to indicate threshold for each channel. The function can take positive or negative thresholds.

# check each channel to determine if they contain outlier events
plot_cyto(seu, "CD4", "CD3", style = "2d_density", assay = "comp", slot = "data")

# manually set thresholds based on cyto plots
thresholds_neg = c(CD3 = -4, CD4 = -2)
thresholds_pos = c(CD3 = 4, CD4 = 4.5)
# remove values above or below thresholds
seu_manual = remove_outliers_manual(seu, thresholds_neg)
seu_manual = remove_outliers_manual(seu_manual, thresholds_pos, negative = FALSE)
plot_cyto(seu_manual, "CD4", "CD3", style = "2d_density", assay = "comp", slot = "data")

8.1.2 Automatic removal

Outliers are detected using an isolation forest. In ungated flow cytometry data, this algorithm will mainly remove axis-near events, dead cells, and doublets.
The threshold at which score an event is regarded an outlier can be supplied, which refers to the approximate depth it takes to isolate an observation. The threshold is usually between 0.6 and 0.8, with an default threshold of 0.7. The lower the threshold, the more cells are labelled as outliers.

The function will save two features for each cell into meta.data of the Seurat Object: 1) the isolation forest score (outlier_score) and whether an event passed the threshold or not (outlier).

# identify outliers (default threshold: 0.7)
seu <- detect_outliers(seu, score_threshold = 0.7)
plot1 <- plot_cyto(seu, x = "CD4", y = "CD3", assay = "comp", slot = "data") + ggtitle("Before removal")
plot2 <- plot_cyto(seu, x = "CD4", y = "CD3", assay = "comp", slot = "data", style = "point",
    color = "outlier_score") + ggtitle("Outlier score")
plot3 <- plot_cyto(seu, x = "CD4", y = "CD3", assay = "comp", slot = "data", style = "point",
    color = "outlier") + ggtitle("Outlier")
plot4 <- plot_cyto(subset(seu, subset = outlier == FALSE), x = "CD4", y = "CD3",
    assay = "comp", slot = "data") + ggtitle("After removal")
plot1 + plot2 + plot3 + plot4

Remove outliers from Seurat object.

# remove outliers from Seurat object
seu <- subset(seu, subset = outlier == FALSE)

8.2 Removal of aggregates

With high-dimensional Flow cytometry, antibody aggregates can occur. These are usually characterized by highly co-linear events that form diagonal structures in XY plots. This algorithm can identify potentially problematic channel combinations and identify aggregates in these channels.

Here is an example of such aggregates:

plot1 <- plot_cyto(seu, x = "CD28", y = "CXCR5")
plot2 <- plot_cyto(seu, x = "CXCR3", y = "CD1C")
plot1 + plot2

The first step is to identify channel combinations that potentially contain aggregates. This is done by selecting only double positive events (default: events > 1) and running a pearson correlation. The default threshold of labelling a channel potentially containing aggregates is pearson R > 0.7. The lower the threshold, the more channels are labelled as potentially containing aggregates.

The function “detect_aggregate_channels” returns a list of 2 matrices and 1 data.frame.
- Matrix 1: pearson correlation matrix.
- Matrix 2: binary correlation matrix based on threshold of pearson R (default threshold = 0.7).
- Data.frame: contains the channel combinations that passed the pearson R threshold.

problem_channels <- detect_aggregate_channels(seu, threshold = 0.7)

These correlation matrices can be plotted, for example, using a heatmap.

pheatmap::pheatmap(problem_channels[[1]], main = "Pearson R", border_color = NA)

pheatmap::pheatmap(problem_channels[[2]], main = "Pearson R>0.7", border_color = NA)

Next, we can plot the channel combinations that potentially contain aggregates and manually assess if these channels are really problematic.

plots <- list()
for (i in 1:nrow(problem_channels[[3]])) plots[[i]] <- plot_cyto(seu, x = problem_channels[[3]][i,
    "Channel_1"], y = problem_channels[[3]][i, "Channel_2"])
do.call(gridExtra::grid.arrange, c(plots, ncol = 4))

rm(plots)

If a channel combination does not look like it contains aggregates, remove that row from the data.frame (problem_channels[[3]]).

Next, we can detect aggregates in the problematic channels using a modified RANSAC algorithm. Each cell will get an aggregate_score, which is the number of channel combinations in which it was labelled as an aggregate. By default, potential aggregates are only labelled real aggregates if they occur in 2 or more channel combinations. The aggregate_score and whether a cell has passed the aggregate_score threshold (default >= 2) is stored in meta.data of the Seurat object.

seu <- detect_aggregates(seu, problem_channels[[3]])
# Check the aggregate removal performance
plots <- list()
for (i in 1:nrow(problem_channels[[3]])) plots[[i]] <- plot_cyto(seu, x = problem_channels[[3]][i,
    "Channel_1"], y = problem_channels[[3]][i, "Channel_2"], style = "point", color = "aggregate_score")
do.call(gridExtra::grid.arrange, c(plots, ncol = 4))

rm(plots)
plots <- list()
for (i in 1:nrow(problem_channels[[3]])) plots[[i]] <- plot_cyto(seu, x = problem_channels[[3]][i,
    "Channel_1"], y = problem_channels[[3]][i, "Channel_2"], style = "point", color = "aggregate")
do.call(gridExtra::grid.arrange, c(plots, ncol = 4))

rm(plots)

If the result is satisfactory, the aggregates can be removed.

seu <- subset(seu, subset = aggregate == FALSE)

It is possible that some aggregates still remain. In that case it is advisable to fine tune the parameters of the aggregate_channels function. Alternatively, remaining artefacts may be removed by detecting outliers again using the isolation forest.

9 Dimensionality reduction & clustering

Dimensionality reduction, clustering & visualizations are accessible via the Seurat workflow. We also integrated a more classic clustering approach for cytometry data: FlowSOM.

9.1 Dimensionality reduction

Multiple reductions are possible via the Seurat workflow: PCA, UMAP or tSNE. If UMAPs and clustering do not separate celltypes well, it can help to only select features for PCA that distinguish expected cell subsets, e.g. CD3, CD4, and CD8 if working with T cells.

Furthermore, input data and data processing can be adjusted:
- No centering (data resembles raw intensities more closely) - No scaling (preserves relationship of absolute marker intensities) - Use (selected) features directly as input

Here, all features are used for scaling, centering and PCA. For UMAP 10 PCs are used.

# scale data
seu <- Seurat::ScaleData(seu, features = row.names(seu), scale = TRUE, center = TRUE)
# run PCA for all features
seu <- Seurat::RunPCA(seu, features = row.names(seu), approx = FALSE)
# run UMAP using all PCs
seu <- Seurat::RunUMAP(seu, dims = 1:10)

Integration of multiple datasets or batch correction methods based on PCA such as CCA integration or harmony can also be used (not done in this tutorial).

All Seurat visualisations are available. For more detail, see https://satijalab.org/seurat/

plot1 <- Seurat::DimPlot(seu, group.by = "sample_id") + set_theme
plot2 <- Seurat::DimPlot(seu, group.by = "layer") + set_theme
plot1 + plot2

Seurat::VlnPlot(seu, features = c("CD8", "CD4"), pt.size = 0) & set_theme & RotatedAxis()

It is recommended to use a custom colorpalette when using UMAP to plot expression of markers.

Seurat::FeaturePlot(seu, features = c("CD8", "CD4")) & scale_color_gradientn(colors = gradient_colors) &
    set_theme

9.2 Clustering

For clustering, multiple methods such as Louvain algorithm are accessible via Seurat:

# run clustering using 10 PCs
seu <- Seurat::FindNeighbors(seu, dims = 1:10)
seu <- Seurat::FindClusters(seu, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 222886
## Number of edges: 4785090
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8505
## Number of communities: 12
## Elapsed time: 130 seconds
Seurat::DimPlot(seu, group.by = "seurat_clusters") + set_theme

We also implemented a FlowSOM and metaclustering via FlowSOM R package

# run flowSOM and metaclustering
seu <- run_FlowSOM(seu, metaclusters = 10, xdim = 10, ydim = 10)
Seurat::DimPlot(seu, group.by = "SOM_cl") + set_theme

Seurat::DimPlot(seu, group.by = "SOM_metacl") + set_theme

9.3 Cluster markers

Not only visualisations, but also other Seurat features are fully functional. For example, FindAllMarkers can be used to help annotate clusters.

markers <- Seurat::FindAllMarkers(seu, group.by = "seurat_clusters", only.pos = TRUE)
top5 = markers %>%
    group_by(cluster) %>%
    top_n(n = 5, wt = avg_log2FC)
Seurat::DotPlot(seu, unique(top5$gene), assay = "comp") + scale_color_gradientn(colors = gradient_colors) +
    theme_linedraw() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
    RotatedAxis()

10 Differential abundance

Differential abundance (DA) analysis is implemented in the Seumetry workflow based on the edgeR package. DA is done using a generalized linear model (GLM) to model number of cells per group given attribute/feature x condition (e.g. seurat_clusters x treatment) and a likelihood ratio rest (LRT) to make a contrast between all conditions.

Using the function with “check_coef = TRUE” will return all coefficients to design the proper contrast for the DA analysis.

# first check the contrasts
differential_abundance(seu, attribute = "seurat_clusters", group_by = "sample_id",
    formula = as.formula("~0+layer"), check_coef = TRUE)
## [1] "layerIEL" "layerLPL"

Here, we compare IEL vs LPL using a contrast of c(1, -1).

# calculate differential abundance
da_res <- differential_abundance(seu, attribute = "seurat_clusters", group_by = "sample_id",
    formula = as.formula("~0+layer"), contrast = c(1, -1))
head(da_res)
##         logFC    logCPM       LR      PValue        FDR
## 6  -2.3653695 15.002031 8.348496 0.003860068 0.04632081
## 9  -2.7907867 11.477721 4.162709 0.041323267 0.24793960
## 8  -2.1624122 13.303887 2.899620 0.088600437 0.35440175
## 4   1.0558878 16.737871 2.279443 0.131098941 0.39329682
## 11  1.8892620  9.509502 1.569190 0.210324659 0.50477918
## 0  -0.6484498 17.823276 1.309026 0.252571159 0.50514232

The results can be exported as a table and/or plotted, for example, using EnhancedVolcano.

EnhancedVolcano::EnhancedVolcano(da_res, lab = rownames(da_res), drawConnectors = TRUE,
    x = "logFC", y = "FDR", title = "IEL vs LPL", pCutoff = 0.1, FCcutoff = 0.1,
    ylim = c(0, 2.5), xlim = c(-6, 6)) + set_theme

11 Differential expression

In addition to Seurat “FindMarkers” function, Seumetry also includes a feature to find differential expression (DE) of markers based on a pseudobulk analysis. To this end, the median fluorescent intensity is calculated using the Seumetry function “median_expression”. The DE analysis is based on the limma package and a linear model is fit followed by empirical Bayes statistics for differential expression (eBayes).

de_res <- DE_pseudobulk(seu, fixed_vars = "layer", contrast = "layerIEL-layerLPL")
head(de_res)
##              logFC     AveExpr         t    P.Value adj.P.Val         B
## HLA-DR -0.19804703 0.198029707 -2.763518 0.01562595 0.3980175 -3.008844
## CCR6    0.15712748 0.002427018  2.625403 0.02041116 0.3980175 -3.178338
## PD1    -0.07610165 0.165835636 -2.204617 0.04533242 0.5663887 -3.686666
## TCRgd  -0.09599237 0.267799595 -2.070057 0.05809115 0.5663887 -3.844226
## CD28   -0.10715628 0.242052820 -1.616276 0.12909410 0.8376953 -4.343091
## CXCR3   0.24258418 0.642406134  1.504957 0.15530949 0.8376953 -4.455169

The results can be exported as a table and/or plotted, for example, using EnhancedVolcano.

EnhancedVolcano::EnhancedVolcano(de_res, lab = rownames(de_res), drawConnectors = TRUE,
    x = "logFC", y = "P.Value", title = "IEL vs LPL", pCutoff = 0.1, FCcutoff = 0.1,
    xlim = c(-0.5, 0.5), ylim = c(0, 2)) + set_theme

12 Additional visualizations

12.1 Cytometry-like plots

Sometimes it is useful to create cytometry-like plots such as density plots. Seumetry includes various plotting functionalities using “plot_cyto”.

plot_cyto(seu, x = "CD3", y = "CD4")

plot_cyto(seu, x = "CD3", y = "CD4", style = "point", color = "sample_id")

plot_cyto(seu, x = "CD3", style = "density", color = "sample_id")

12.2 Sample-level PCA

A principal component analysis (PCA) based on median marker expression per sample can give a good impression of differences between conditions or replicates. Seumetry includes the “plot_pca” function to compute a sample-level PCA. The PCA can also be grouped by other factors than sample (based on Seurat Object meta.data).

plot_pca(seu)

plot_pca(seu, group_by = "sample_id", color = "layer")

12.3 Frequency plots

Seumetry includes a function to visualise the frequency or proportion of cells based on metadata columns. For example, it can be useful to assess whether the frequency of clusters is stable across different samples or differents between conditions.

plot_frequency(seu, "seurat_clusters", "sample_id")

plot_frequency(seu, "seurat_clusters", "layer")

12.4 Cell numbers

plot_cellnumber(seu, "sample_id")

13 Other features

13.1 Bead normalisation

The test dataset used throughout this vignette is a flow cytometry dataset, bead normalisation is only available for mass cytometry data. Here, we implemented bead normalisation by using a wrapper function for the normCytof() function of the CATALYST package. To showcase the bead normalisation implementation in Seumetry, we use the raw_data provided by CATALYST.

library(CATALYST)
data("raw_data")

Now we need to prepare a panel and add cell IDs to load data into Seumetry.

# prepare panel dataframe
panel_cat <- data.frame(fcs_colname = unname(flowCore::parameters(raw_data[[1]])@data[["name"]]),
    antigen = unname(flowCore::parameters(raw_data[[1]])@data[["desc"]]))
panel_cat <- panel_cat[!is.na(panel_cat$antigen), ]
# give cells an ID (required for Seurat Object creation)
row.names(raw_data[[1]]@exprs) <- paste0("raw_data_1_", 1:nrow(raw_data[[1]]@exprs))
row.names(raw_data[[2]]@exprs) <- paste0("raw_data_2_", 1:nrow(raw_data[[2]]@exprs))
# create seurat + transform (cofactor 5 default)
seu_cat <- create_seurat(raw_data, panel_cat)
# to identify beads, normCytof() requires transformed data
seu_cat <- transform_data(seu_cat, "arcsinh")

Finally we can perform bead normalisation. It will use the DefaultAssay of the Seurat Object and save normalised raw intensities in “beadnorm” assay. Transformation should be done afterwards. Based on CATALYST documentation, parameter k refers to “median window used for bead smoothing” and affects visualizations only.

seu_cat <- bead_norm(seu_cat, beads = "dvs", plot_res = TRUE, k = 50)

Furthermore, the returned Seurat Object contains two new metadata columns:
- beads: events that likely represent beads.
- bead_doublet: events that likely represent beads OR bead-cell doublets.
To remove beads and bead doublets from the object, you can use subset.

seu_cat <- subset(seu_cat, subset = bead_doublet == FALSE)
seu_cat
## An object of class Seurat 
## 123 features across 4851 samples within 3 assays 
## Active assay: beadnorm (56 features, 0 variable features)
##  2 layers present: counts, data
##  2 other assays present: fcs, unused

13.2 Integration of third-party tools

To allow integration with a wide variety of single-cell and cytometry libraries, we implemented a function (convert_seurat) to convert Seurat Objects to SingleCellExperiment, flowFrame, and flowSet objects. These can be used, for example, with batch correction methods (cytoNorm, cyCombine, gaussNorm) or other third-party libraries (diffcyt, CATALYST).

fs <- convert_seurat(seu, to = "FS", split_by = "sample_id")
fs
## A flowSet with 14 experiments.
## 
## column names(39): IL15RA CD27 ... CD11C IgD
ff <- convert_seurat(seu, to = "FF")
ff
## flowFrame object 'FlowFrame from Seurat'
## with 222886 cells and 39 observables:
##        name               desc     range  minRange  maxRange
## $P1  IL15RA              APC.A         6  -3.54119         5
## $P2    CD27          APC.Cy7.A         5  -2.93624         4
## $P3    CCR7     APC.Fire.810.A         5  -2.09615         4
## $P4   CD127         APC.R700.A         6  -1.83344         5
## $P5    CD1C  Alexa.Fluor.647.A         7  -2.89173         6
## ...     ...                ...       ...       ...       ...
## $P35   CD19    Spark.NIR.685.A         5  -3.30207         4
## $P36  CD123 Super.Bright.436.A         7  -3.30280         6
## $P37    CD4     cFluor.YG584.A         5  -2.38044         4
## $P38  CD11C       eFluor.450.A         6  -2.44146         5
## $P39    IgD       eFluor.506.A         6  -3.25182         5
## 290 keywords are stored in the 'description' slot
sce <- convert_seurat(seu, to = "SCE")
sce
## class: SingleCellExperiment 
## dim: 39 222886 
## metadata(0):
## assays(1): counts
## rownames(39): IL15RA CD27 ... CD11C IgD
## rowData names(6): fcs_colname antigen ... biexp_pos biexp_width
## colnames(222886): 1_Ileum_IEL_0 1_Ileum_IEL_6 ... 7_Ileum_LPL_8968
##   7_Ileum_LPL_8969
## colData names(20): orig.ident nCount_fcs ... SOM_cl SOM_metacl
## reducedDimNames(2): pca umap
## mainExpName: NULL
## altExpNames(0):

13.3 Median fluorescent intensity

The Seumetry function “median_expression” can be used similar to Seurat “AverageExpression” function, but uses the median instead of the average. This function can be useful, for example, to plot median expression per cluster or condition.

sample_mfi <- median_expression(seu, group_by = "sample_id")
head(sample_mfi)
##        1_Ileum_IEL 1_Ileum_LPL 2_Ileum_IEL 2_Ileum_LPL  3_Ileum_IEL 3_Ileum_LPL
## IL15RA -0.17231694 -0.20061548  0.08954173  0.11188327 -0.067248831 -0.09096448
## CD27   -0.02178888 -0.05438485  0.01792439 -0.03760991  0.005206601 -0.01125733
## CCR7   -0.06622177 -0.12757206 -0.03389536 -0.04036648 -0.041590060 -0.03376181
## CD127   0.88379189  1.30317430  0.61188462  0.87371806  0.627032455  0.73614481
## CD1C    0.09221767  0.11356827 -0.18390634 -0.18468277 -0.046830410 -0.03263464
## CD141  -0.27618478 -0.67456049 -0.05672522 -0.13408594  0.117741572  0.15489495
##        4_Ileum_IEL  4_Ileum_LPL 5_Ileum_IEL 5_Ileum_LPL  6_Ileum_IEL
## IL15RA -0.10834009 -0.171693316 -0.10544559 -0.15079287 -0.127468283
## CD27    0.02081389  0.002322309  0.01027567  0.09966123 -0.008577949
## CCR7   -0.04065862 -0.049216457 -0.08166193 -0.02445750 -0.041674662
## CD127   0.63483715  0.674048659  0.45256332  0.16745530  0.079980064
## CD1C    0.01583101  0.095398638  0.02178680  0.27022532  0.315312894
## CD141  -0.23034196 -0.247730611 -0.64149652 -0.51838910 -0.377463137
##        6_Ileum_LPL 7_Ileum_IEL 7_Ileum_LPL
## IL15RA -0.13495539 -0.12421760 -0.21286077
## CD27   -0.01004190 -0.01960903 -0.02484947
## CCR7   -0.07653837 -0.06958973 -0.12145540
## CD127   0.90290669  0.28859922  0.16958058
## CD1C    0.27721884  0.20944315  0.32803206
## CD141  -0.42756710 -0.38601093 -0.48452998

13.4 Export of FCS files

Sometimes it can be useful to view specific clusters in external cytometry Software. For this purpose, Seumetry includes a function to export FCS files.

# subset Seurat object to specific cluster
seu_sub <- subset(seu, subset = seurat_clusters == 1)
# export FCS file containing cells from this cluster
export_fcs(seu_sub, filename = "cluster_1.fcs", assay = "fcs", slot = "counts")